Multi-Armed Bandits on Unit Interval Graphs
نویسندگان
چکیده
An online learning problem with side information on the similarity and dissimilarity across different actions is considered. The problem is formulated as a stochastic multiarmed bandit problem with a graph-structured learning space. Each node in the graph represents an arm in the bandit problem and an edge between two nodes represents closeness in their mean rewards. It is shown that the resulting graph is a unit interval graph. A hierarchical learning policy is developed that offers sublinear scaling of regret with the size of the learning space by fully exploiting the side information through an offline reduction of the learning space and online aggregation of reward observations from similar arms. The order optimality of the proposed policy in terms of both the size of the learning space and the length of the time horizon is established through a matching lower bound on regret. It is further shown that when the mean rewards are bounded, complete learning with bounded regret over an infinite time horizon can be achieved. An extension to the case with only partial information on arm similarity and dissimilarity is also discussed.
منابع مشابه
On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs
This paper discusses how to efficiently choose from n unknown distributions the k ones whose means are the greatest by a certain metric, up to a small relative error. We study the topic under two standard settings—multi-armed bandits and hidden bipartite graphs—which differ in the nature of the input distributions. In the former setting, each distribution can be sampled (in the i.i.d. manner) a...
متن کاملActive Search and Bandits on Graphs using Sigma-Optimality
Many modern information access problems involve highly complex patterns that cannot be handled by traditional keyword based search. Active Search is an emerging paradigm that helps users quickly find relevant information by efficiently collecting and learning from user feedback. We consider active search on graphs, where the nodes represent the set of instances users want to search over and the...
متن کاملUnimodal Bandits
We consider multiarmed bandit problems where the expected reward is unimodal over partially ordered arms. In particular, the arms may belong to a continuous interval or correspond to vertices in a graph, where the graph structure represents similarity in rewards. The unimodality assumption has an important advantage: we can determine if a given arm is optimal by sampling the possible directions...
متن کاملModal Bandits
Analyses of multi-armed bandits primarily presume that the value of an arm is its expected reward. We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions.
متن کاملA Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements
We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.04339 شماره
صفحات -
تاریخ انتشار 2018